Transposon mediated multiplex sequencing

ABSTRACT

The present invention relates to an automated method of transposon-mediated multiplex sequencing of DNA fragments inserted into a vector. It relates more particularity to an increased efficiency in such automated methods, where the increased efficiency is obtained by screening out before the sequencing those constructs in which the transposon inserted into the vector sequence. This prevents a waste of time and resources in performing reactions sequencing the vector instead of the DNA fragments of interest.

FIELD OF THE INVENTION

[0001] The present invention relates to an automated method oftransposon-mediated multiplex sequencing of DNA fragments inserted intoa vector. It relates more particularly to an increased efficiency insuch automated methods, where the increased efficiency is obtained byscreening out before the sequencing those constructs in which thetransposon inserted into the vector sequence. This prevents a waste oftime and resources in performing reactions sequencing the vector insteadof the DNA fragments of interest.

BACKGROUND OF THE INVENTION

[0002] The enormous wealth of information that has been acquired fromgenomic and expressed sequence tag (EST) sequencing in the last 10 yearshas contributed significantly to efforts to clone full-length cDNArepresentatives. Although it is anticipated that genomic sequencingprojects from human and mouse will be completed in the near future, thetranscriptome of these species will remain ambiguous for some time. Thecomplexities involved in predicting, with complete certainty, thesplicing program of mRNAs from genomic sequences have compelledadditional genomic research focused on obtaining the sequences offull-length cDNAs. In addition, full-length cDNA sequencing efforts arealso required for the confirmation of cDNA sequences after methods thatinvolve amplification of the cDNA have been employed for cloning. Thisscenario is particularly prevalent in genomic centers that are focusedon validating gene targets for drug discovery efforts. Clearly, aftergreat expense and effort has been expended, it would be senseless for aputative target to fail the validation process simply because the codingsequence of the target gene was incorrect. Therefore, approaches arerequired at genomic centers to sequence large numbers of full-lengthclones quickly, inexpensively, and accurately. For this purpose,Applicants have created a new, integrated high-throughput process calledtransposon expedited multiplex sequencing (TEMS).

[0003] In the last 20 years, many methods have been developed forsequencing large inserts to plasmids. However, many of these methodswere cumbersome and could not be transferred to high-throughput,automated systems. Sequencing by primer walking is slow, expensive, andoften fails since primers are designed to the sequence in a poorlycharacterized region. Similarly, sequencing by creating a collection ofclones by exonuclease digestion from the ends of the target clone isslow, clone specific, and extremely sensitive to the purity andintegrity of the template DNA. In addition, the success rate using thisapproach is quite variable. Shotgun sequencing of clones is ahigher-throughput method, however it requires isolating the insert fromeach clone and then recloning smaller fragments generated by a widevariety of methods. Additionally, shotgun libraries that utilizerestriction digests result in a cloning bias and subsequently anon-random distribution of DNA sequence data.

[0004] Transposon-mediated sequencing can be done by pooling a largenumber of vectors containing target DNA sequences and randomly insertinga transposon with sequencing primers on each end into the constructs.See Devine, S. E., Boeke, J. E. (1994) Efficient integration ofartificial transposons into plasmid targets in vitro: a useful tool forDIVA mapping, sequencing and genetic analysis, Nucleic Acids Research,pp. 3765-3772; or Kimmel, B., M. J. Palazzola, C. Martin, J. D. Boeke,and S. E. Devine, 1997, Transposon-mediated DNA sequencing. In GenomicAnalysis: A laboratory manual (ed. E Green, B. Birren, R. Myers, and P.Hieter), Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.,for a description of the method. Traditionally, this method wascumbersome since it required moving plasmids through different hoststrains for the cloning and transposon-insertion steps. Recentlyhowever, several commercial molecular biology vendors have developed invitro transposition systems to take advantage of the random insertion ofa modified transposon ( ie. Tn5 etc.). Unfortunately, some of thesesystems result in a high background of false positives, and aredifficult to use with methods to screen positive clones by polymerasechain reaction (“PCR”). Applicants have utilized several of thesetransposon based sequencing methods and have not experienced any ofthese difficulties with a modified version of the in vitro GPS-1transposition system from New England Biolabs. Nevertheless, transposoninsertions cannot be directed exclusively to the target DNA of interestand appear in the vector with a high frequency. In an effort to solvethis problem and increase the efficiency of transposon facilitatedsequencing, Applicants have developed a unique, high-throughputprocedure called transposon expedited multiplex sequencing (TEMS).

[0005] Accordingly, it is an object of this invention to provide ahigh-throughput, efficient, and inexpensive process for the sequencingof DNA fragments.

[0006] It is a further object of this invention to provide ahigh-throughput, efficient and inexpensive process fortransposon-mediated sequencing of target DNA fragments which minimizesthe amount of non-target DNA sequence generated.

[0007] It is yet another object of this invention to provide a PCR-basedscreen to distinguish between the desired constructs with transposonsinserted into the target DNA sequence and the undesired constructs withtransposons inserted elsewhere.

SUMMARY OF THE INVENTION

[0008] The present invention meets the above objects by providing thefollowing method, which may be automated for further convenience.Multiple DNA target sequences, each cloned into a vector, are pooled andselectable transposons with sequencing primers on each end are insertedrandomly into the DNA target-containing vectors. Selected, transposon-and DNA target sequence-containing vectors are then individuallyscreened using a PCR reaction to identify those vectors which havetransposons located in the DNA target sequence. The PCR reaction usesprimers located at each end of the vector sequence and is optimized toprovide sufficient extension time for the PCR polymerase to efficientlyproduce a product the length of the vector, but insufficient extensiontime for the PCR polymerase to efficiently produce a product the lengthof the vector plus the transposon. Therefore, significant amounts of PCRproduct will only be produced for vectors containing the transposon inthe DNA target sequence and not the vector. The presence or absence of asignificant amount of PCR product for each PCR reaction can be quicklydetermined by the use of a quantitative fluorescent dye selective fordouble stranded DNA (“dsDNA”). Vectors identified by a significant PCRproduct in the prescreen are then individually sequenced using thesequencing primers on each end of the transposon to read out into eachDNA target sequence. The raw DNA target sequences of individualsequencing reactions can be combined to determine the full sequence ofeach of the DNA targets in the pool.

[0009] The parallel processing of clones greatly increases the speed inwhich reactions can be set up and sequenced. Additionally, the processdoes not rely on any particular vector and does not require anyrecloning steps. Most important for overall efficiency, sequencing ofthe vector backbone is minimized or eliminated with the screening step.Automation of each of the steps of the instant process can be readilyachieved using equipment available to those in the art of automated DNAsequencing.

BRIEF DESCRIPTION OF THE DRAWINGS

[0010]FIG. 1 is a diagram of the steps of TEMS.

[0011]FIG. 2 shows the sequence of GPS-Apra transposon vector.

[0012]FIG. 3 shows the sequence of GPS-Apra-2 μm-URA3 transposon vector.

[0013]FIG. 4 is a picture of an electrophoresis gel stained to show theproducts of PCR reactions using various polymerases to multiply thevector portions of the transposon- and target- DNA-containing vectors.

[0014]FIG. 5 is a picture of an electrophoresis gel stained to show theproducts of PCR reactions having too great an extension time toeliminate an approximately 9 kb product.

[0015]FIG. 6 is a picture of an electrophoresis gel stained to show theproducts of PCR reactions at a variety of extension temperatures andhaving different numbers of cycles.

[0016]FIG. 7 is a picture of an electrophoresis gel stained to show theproducts of PCR reactions and a table of the PICOGREEN® results for thesame PCR reactions.

DEFINITIONS

[0017] As used herein:

[0018] A “vector” is any DNA construct which replicates in a cell andhas an insertion site at which unknown DNA sequences can be inserted.

[0019] “Selectable”, in reference to a vector or transposon, means thatthe vector or transposon carries a phenotype that can be used toidentify cells transformed with the vector or transposon. Preferably,the phenotype allows the transformed cells to survive in conditions inwhich non-transformed cells cannot.

[0020] A “target DNA” or “target DNA sequence” is a DNA molecule ofknown or unknown sequence which the operator of the method desires tosequence.

[0021] A “representative number of individual transformants” is aminimum number of individual transformants which ensures that for eachtarget DNA in the pool there will be a sufficient number oftransformants containing insertions of the transposon into that targetDNA to provide a desired number of independent sequencing determinationsof each base in that target DNA.

[0022] “Substantial amounts of dsDNA” in regard to PCR products is anamount of dsDNA that exceeds a predetermined threshold. Thepredetermined threshold can be identified by evaluating the variabilityof PCR products found within PCR reactions with several constructs whichare known to contain the transposon inside or outside the vectorsequence.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0023] The present invention provides a method that can be efficientlyautomated for sequencing multiple target DNAs in parallel. In themethod, each target DNA sequence is inserted into a selectable vector,preferably a plasmid, at identical insertion points in the vectorsequence. It is important that all target DNAs are inserted into thesame vector or, if different vectors are used, that each vector issubstantially the same size and has identical sequences around theinsertion point so that one set of primers can be used to amplify allthe vectors by PCR reaction.

[0024] The vectors containing each of the multiple target DNA sequencesare pooled in ratios based on the length of each target DNA sequence, sothe pool contains substantially equal amounts of each kb of each targetDNA sequence. The ratios for each vector to be included in the pool canbe determined by determining the lengths of each target DNA sequence(for instance, by electrophoresis against standards size markers on anagarose gel), and adding to the pool a set amount of each vector forevery kb in length of the target DNA sequence in that vector.

[0025] A transposon-insertion reaction is then performed on the pool torandomly insert a selectable transposon into the target-DNA containingvectors. Although this step could be performed by traditional, in vivo,transposon insertion, that would require moving a plasmid through morethan one cell type. In vitro transposon-insertion reactions are muchsimpler and faster. The transposon must carry a selectable marker andmust be of sufficient size to enable PCR conditions to be establishedthat efficiently multiply a dsDNA fragment the size of the vector, butdo not significantly multiply a dsDNA fragment the size of the vectorplus the transposon. Preferably, the transposon is about equal to orgreater than the vector in length. Most preferably, the transposon isequal to or greater than {fraction (5/4)} the length of the vector.Particularly preferably, the transposon is equal to or greater than{fraction (5/3)} the length of the vector. Where the vector is abacterial plasmid, the transposon is preferably at least about 3 kb,most preferably at least about 4 kb and particularly preferably greaterthan or equal to about 5 kb.

[0026] The pool of vectors from the transposon-insertion reaction isthen used to transform cells and the transformants are grown underconditions which select for the transposon. Selected transformants areisolated and grown into cultures. The number of isolated transformantsgrown into cultures depends on the number and size of target DNAsequences in the pool. Enough transformants must be isolated and grownto ensure that for each target DNA sequence, the isolated transformantswill include enough individual insertions of the transposon into thatparticular target DNA sequence to provide several fold independentsequence determinations of each and every base in that target DNAsequence. To ensure against errors, those of skill in the art often seekto obtain a 6-fold coverage of sequence data, although they may opt formore or less, as the situation merits. When attempting to obtain 6-foldcoverage, as a general rule of thumb, 18 clones will need to be actuallysequenced per 1 kb of nucleotides to obtain this coverage. However toactually sequence 18 clones, a greater number of transformants will haveto isolated and grown, because the transposon will not have integratedinto the target DNA in every case, but will have randomly inserted overthe whole target DNA-containing vector. Therefore, the number 18 ismultiplied by the likelihood that the transposon inserted into thetarget DNA sequence. So, to obtain 6-fold sequencing coverage, for eachtarget DNA sequence in the pool, a number of transformants to beisolated is equal to: the length of the target DNA sequence in kb times18, times the ratio (size of vector)/(size of DNA target sequence).

[0027] Each culture from a selected transformant then has a PCR reactionperformed on its DNA. The PCR reaction uses primers complementary to the3′ ends of the vector sequence at the insertion point, which allow PCRmultiplication of a fragment the size of the entire vector sequence. Thereaction conditions of the PCR reaction are optimized to efficientlyproduce a fragment the length of the vector, but not to efficientlyproduce a fragment the length of the vector plus the transposon. Theoptimization of PCR conditions for any particular vector and transposoncombination according to this invention can be determined by one ofskill in the art through routine experimentation, as guided by suchreference works as Kimmel, B., M. J. Palazzola, C. Martin, J. D. Boeke,and S. E. Devine, 1997, Optimizing PCR Assays, Methods for ImprovingPCR, Detecting and Characterizing PCR products, Protocols for Detectingand Characterizing PCR Products. In Genomic Analysis: A laboratorymanual (ed. E Green, B. Birren, R. Myers, and P. Hieter), Cold SpringHarbor Laboratory Press, Cold Spring Harbor, N.Y.; and Cha, R. S. andThilly, W. G. (1995) in: PCR Primer, Dieffenbach, C. W. and Dveksler, G.S. (eds.), CSH Press, New York. The particular DNA polymerase used forPCR should not be limiting for the method, although the choice ofpolymerase may help to optimize the reaction. Generally, the elongationtemperature and/or time of the PCR are varied to achieve the desiredexclusion of the vector plus transposon sized product. For ademonstration of optimization, see Example 2.

[0028] The amount of product produced in each PCR reaction is thenmeasured, preferably optically. Optical measurement could be made by anymethod which distinguishes dsDNA from nucleotides, but is preferablymade using a fluorescent dye specific for dsDNA. dsDNA specific dyesinclude the bis-benzimide dye Hoechst 33528 and the cyanine dyes ofMolecular Probes, Inc, Eugene, Oreg., including PICOGREEN®. PICOGREEN®is preferred, due to its sensitivity and selectivity for dsDNA. Themeasured value of fluorescence is compared to a predetermined thresholdlevel for determining whether a significant amount of dsDNA PCR productis present. The threshold level of fluorescence is set experimentallyusing positive and negative control PCR templates of the size of thevector alone and the vector plus the transposon, under the same PCRconditions that will be used to screen the isolated transformants.

[0029] Those transformants corresponding to PCR reactions which did notproduce substantial amounts of dsDNA are not sequenced, because theycontain the transposon in the vector sequence. Transformantscorresponding to PCR reactions which produce substantial amounts ofdsDNA contain the transposon in the target DNA sequence and aresequenced. Two sequencing reactions are performed for each transformant,using primers from each end of the transposon to read the sequence ofthe target DNA sequence surrounding the transposon. The sequencingreactions can be performed by any of the known methods of the art.

[0030] Once sequence data has been collected from all the transformantswhich passed the PCR screen, the sequences must be assembled into allthe individual members of the pool of target DNA sequences. This can beaccomplished by methods of computer sequence analysis known in the art,and the exact manner is not limiting on this invention.

[0031] The advantages of the instant method include speed, efficiencyand an ability to automate the process. Particularly, the instant methodprovides a quick PCR screen that allows for elimination of thoseunproductive transformants which have the transposon inserted into thevector sequence and would yield mainly vector sequences if sequenced.The PCR screen is fast, can be automated and does not require anyisolation of the PCR product, such as running an agarose gel todetermine its size. The use of a pool of target DNAs allows a singletransposon reaction and a single transformation to be performed, insteadof requiring one for each target DNA sequence.

EXAMPLES Example 1

[0032] Choosing a Transposon to Fit a High-Throughput Screening Process

[0033] Original Transposon from New England Biolabs

[0034] The Genome Priming System Kit from New England Biolabs, 32 TozerRoad, Beverly, Mass. 01915 (“NEB”) was used to initially prove that atransposon inserted randomly into DNA targets could be an efficient wayto completely sequence the targets with a great degree of accuracy.Supplied with the kit was one of two transposons that contained thekanamycin resistance gene. However, kanamycin is a fairly commonresistance marker, and if a target DNA sequence was already cloned intoa kanamycin-selectable vector, then either a different transposon orreclonig of the target DNA sequence would be required. The kit providedanother transposon that contained the cldoramphenicol resistance marker,but that, too, is fairly common. Therefore, the transposon should bemodified to contain an uncommon resistance marker, such as apramycin, toavoid additional cloning steps and reduce human error in choosing thecorrect transposon for each DNA target. The GPS-Apra transposon (FIG. 2)was constructed by modifying the NEB pGPSI transposon.

[0035] Screening Method Using the Modified Transposon, GPS-Apra

[0036] Initial experiments showed that a screening process couldidentify from transformants selected for the transposon thosetransformants containing the transposon in the target DNA sequence andnot the vector. This reduces costs by eliminating unproductivesequencing of the vector sequence. The original NEB pGPSI transposon was2.614 kb with ˜1.7 kb of that, with priming sites at each end, beingrandomly inserted into the target DNA. By PCR amplification of the 3.0kb vector using complement M13F and complement M13R primers, twodistinct PCR products could be visualized on a slab gel. A 3.0 kb PCRproduct indicates that the transposon must be in the target DNAsequence, since the vector is 3.0 kb. A 4.7 kb PCR product indicates avector (3.0 kb) containing the transposon (1.7 kb). Therefore, clonescontaining the transposon in the target DNA sequence could be identifiedby the presence of 3.0 kb PCR products on the gel.

[0037] To demonstrate identification of clones containing the transposonin the insert, colonies were selected from the transformation followingthe transposon reaction and inoculated into media containing bothantibiotics for selection, apramycin and the plasmid antibiotic. Theculture was grown up overnight and PCR was done the following day bykerplunking approximately 1 ul of culture into a PCR cocktail in a 96well plate. The PCR cocktail included the Amplitaq enzyme and reagentsfrom PE Corporation, PE Corporation, 761 Main Avenue, Norwalk, Conn.06859. The conditions for PCR were standard {95 C. for 5 min, (95 C. for30 sec, 54 C. for 1 min, 72 C. for 6 min, 30 cycles), 72 C. 1 min, 4 C.forever}. After completion of the cycle (approximately 2.5 hr), gelelectrophoresis was done using 8 ul of the 50 ul PCR reaction on astandard size gel. There was a clearly detectable size distinctionbetween the PCR products from vector sequence without the transposon andvector sequence including the transposon.

[0038] To maintain a high-throughput for the sequencing reactions,however, electrophoresis would have to occur on large gels that couldhold hundreds of samples at one time. On these large gels, thedifference between a 3 kb and a 4.7 kb fragment would be difficult toreproducibly detect with an automated system. Also, gel electrophoresiswas very time consuming and would interfere with performing the screenin a high-throughput manner.

[0039] In a search for other, high-throughput methods, fluorescentdetection of dsDNA was settled on as reproducible and less timeconsuming. The PICOGREEN® assay seemed particularly convenient. However,fluorescent detection of dsDNA cannot distinguish between different sizefragments, so the PCR reaction had to be altered so that the presence orabsence of any PCR product indicates the position of the transposon inthe target DNA containing vector. If only clones with the transposon inthe target DNA sequence produced a dsDNA PCR product, then thePICOGREEN® dsDNA dye would fluoresce at a high level for only thedesired clones. The wells that contain the transposon in the vectorwould not amplify a significant dsDNA product and would result in a lowlevel of fluorescence. A cutoff level of fluorescence between desiredand undesired clones can be determined by the use of known samples.

[0040] A variety of PCR conditions were tested by lowering theelongation temperature, but the 4.7 kb band would not consistently dropout, without having an increase of false positives. It was decided thatif the transposon could be modified again to greatly increase its size,then PCR conditions could be developed that would consistently drop outthe upper, transposon-containing band, thereby lowering the falsepositive rate. A large “stuffer” gene that would not effect thetransposon reaction in a negative way was sought. A yeast 2 um plasmidwith the URA3 gene met the size criteria and could also be used in yeastexperiments in the future. Therefore, a GPS-Apra-2um-URA3 transposonplasmid (FIG. 3) was constructed having a total size of 6.1 kb,including the transposon of ˜5.0 kb. The PCR conditions were alteredsuccessfully to drop out the clones with the transposon in the vectorand the transposition reaction was not affected. See Example 2.

Example 2

[0041] Optimization of PCR Conditions to Reduce False Negative and FalsePositive Results

[0042] Basic PCR Concept

[0043] This screen is based on PCR cycling conditions that have beenspecifically optimized to amplify a 3.9 kbp fragment of the entirevector DNA. In this case, the vector was PCR4Blunt-TOPO from Invitrogen,1600 Faraday Ave, Carlsbad, Calif. 92008. If the vector has the 5 kbptransposon inserted into it, its size will be increased to 8.9 kbp,which is too large to be amplified under the optimized PCR conditions.

[0044] The primer sequences selected to amplify the vector portion ofthe clones are the complement sequences of the universal primersM13F(−20) and M13R. The sequences are as follows:5′-ACTGGCCGTCGTTTTAC-3′ and 5′-CATGGTCATAGCTGTT-3′. This amplificationis illustrated in FIG. 1.

[0045] Background on the Development of the 384-Well Format

[0046] In order to screen transposon reactions from pools containingmultiple clones it was necessary to develop a screen that was both quickand cost efficient. Initial development attempts were based on the96-well protocol. To change to a 384-well plate, all reagent volumeswere halved to accommodate the smaller well volume. However, theseconditions proved to be less than ideal due to the high number of PCRfailures that were being observed. A search for PCR reagents andconditions that were more robust for this format was undertaken. Theresults of experiments that tested a wide range of enzymes andthermocycling conditions will be described in the following sections.

[0047] 384-Well PCR Optimization Process

[0048] Polymerase Selection

[0049] A quick screening of a variety of PCR kits from several vendorswas done to determine a 384-well replacement for the PE CorporationAmpliTaq polymerase that had been used in the 96-well format. Thefollowing pcr kits were tested: Clontech's Advantage cDNA PCR kit,Epicentre's Failsafe PCR system, and three kits from Takara; LA Taq, ExTaq, and Z Taq. (Clontech Laboratories Inc. is located at 1020 EastMeadow Circle Palo Alto, Calif. 94303-4230; Epicentre Technologies islocated at 1402 Emil Street, Madison, Wis. 53713 ; and Takara Shuzo Co.,LTD has a business address of Biomedical Group Seta 3-4-1, Otsu, Shiga,520-2193, Japan.) Samples tested represented 30 pooled DNA targets. Theywere cloned into pBluescript SK- (3.0 kb) from Stratagene, 11011 NorthTorrey Pines Road, La Jolla, Calif. 92037. The growth time was 20 hr in384 well format. Based on preliminary data the Z-taq system was chosenas the enzyme to pursue due to properties it possessed which made itideal for use in a high-throughput system like TEMS. See FIG. 4, whichshows a stained electrophoresis gel on which the PCR products of thesamples have been run.

[0050] Reagent Optimization

[0051] It was also desirable to decrease the cost per reaction of thePCR screen. Table 2, shown below, summarizes the various experimentsthat were done to test the amount by which each of the various reagentscould be lowered without compromising the results. TABLE 2 Tablesummarizing the various PCR cocktail ingredients that were tested duringthe optimization process. Variable Tested Condition Tested OptimalResult Z- Z-Taq polymerase 0.625, 0.313, 0.156 Units 0.313 Units 10XZ-Taq Buffer 1x, 0.5x 1x Primers 2.5, 5, 10 pmol 5 pmol Primers cM13f &cM13r cM13f & cM13r cT3 & cT7 dNTP's 100, 150, 200 μM 150 μM

[0052] Based on the above experiments the following optimized reactioncocktail mix (per sample) was determined:

[0053] ddH₂0: 17.875 μL

[0054] 10×Z-Taq buffer: 2.5 μL

[0055] DNTP mixture: 1.5 μL

[0056] template DNA: 1 μL (deposited using a Kerplunker #96/384,manufactured by Nalge Nunc International, 75 Panorama Creek Drive,Rochester, N.Y. 14625, and Pin Replicator #250520/250393)

[0057] Z-Taq DNA polymerase: 0.125 μL (0.3125 Units)

[0058] cM13f (5 pmol/μL): 1 μL

[0059] cM13r (5 pmol/μL): 1 μL

[0060] Optimization of Thermocycler Conditions

[0061] Initial thermocycling conditions used were based onrecommendations found in the Takara literature. These conditionswere: 1. denature: 98° C. for 5 seconds 2. anneal: 55° C. for 10 seconds3. extension: 72° C. for 60 seconds (15.4 seconds/kb) 4. go to 1, 30times 5. 4° C. until the PCR reaction is evaluated.

[0062] The results of using these conditions are shown in FIG. 2. Fromthat data, it was decided that the annealing temperature should belowered so as to try and decrease the reaction specificity and thusincrease the number of positives that are amplified. In the nextexperiment the annealing temperature was lowered to 52 ° C. Althoughthis helped with the overall reaction results, it was discovered thatusing an extension time of 15.4 seconds per kbp resulted in theamplification of vector containing the transposon insertion. See FIG. 5,which shows a stained electrophoresis gel on which PCR products from 47transformants selected after transposon insertion in targetDNA-containing pBluescript SK- and grown for 20 hours in wells of a 384well plate were run. The appearance of a 9 kbp band, believed to be theamplified product of vector containing the transposon, is the result ofan excessive extension time.

[0063] Additionally, an annealing temperature gradient between 48° C.and 68° C. was tested to confirm that 52° C. was indeed the optimaltemperature to use for this PCR. In the same experiment the number ofcycles was also tested. 30, 35, 40, and 45 cycles were tested. From theresults of that experiment it was determined that the number of cyclesneed to remain low to avoid background noise. From the 48° C. and 68° C.gradient the 52° C. annealing temperature was reconfirmed as beingoptimal. The temperature is not too low that it becomes non-specific andnot too high that it risks approaching the 56° C. temperature at whichthe PCR begins to fail. These results are shown in FIG. 6, a picture ofa stained electrophoresis gel, where each depicted product representsamplification of the control pBluescript SK- vector at 2.5 ng per well,in a 384 well format with 25 ul PCR cocktail volume. Based on the abovedata, the thermocycling conditions were determined to perform optimallyusing the following cycle. 1. denature: 98° C. for 5 seconds 2. anneal:52° C. for 10 seconds 3. extension: 72° C. for 30 seconds (7.7seconds/kb) 4. go to 1, 35 times 5. 4° C. until evaluation of the PCRreaction.

[0064] Since evaporation was found to be a problem in the 384-wellplates, rubber ‘P’ seals from MJ Research, 590 Lincoln Street, Waltham,Mass. 02451, are placed over sealed plates before thermocycling begins.Along with this, and as recommended by the manufacturer, the lidtemperature is set to 85° C.

[0065] Table 3, shown below, provides a summary of the various PCRthermocycling conditions tested during the optimization process. TABLE 3Summary of the PCR thermocycling conditions that were tested during theoptimization. Variable Tested Condition Tested Optimal Result Annealing38° C.-68° C. 52° C. Temperature PCR cycle 30, 35, 40, 45 35

[0066] Screening for Positive Clones Using the dsDNA QuantitationReagent PICOGREEN®

[0067] To bypass the labor-intensive task of gel electrophoresis, theTEMS PCR screen utilizes the dsDNA quantitation reagent, PICOGREEN®. ThePICOGREEN® is diluted to a working dilution of 1:150 in 1×TE buffer.This dilution was found to be optimal, although it is slightly moreconcentrated than the 1:200 dilution recommended in the manufacturer'sprotocol.

[0068] 25 μL per well of 1:150 PICOGREEN® is aliquoted into black384-well plates using a Robbins Hydra 96 Dispenser from RobbinsScientific, 1250 Elko Drive, Sunnyvale, Calif. 94089-2213. 5 μL per wellof each PCR reaction is then added to the PICOGREEN®, again using theHydra Dispenser. The plate is then immediately placed into a MolecularDynamics Gemini Fluorescence Plate Reader and the plate is then mixedfor 10 seconds before its fluorescence signal is measured.

[0069] For the assay to function as a screen, there must be a cleardifference in the fluorescence signal between positive and negativewells. PCR product amounts of 1.3, 2.5, 5, and 10 μL were tested todetermine the greatest assay range of positive signal to background.Results showing that 5 μL of PCR product into 25 μL of PICOGREEN® is theoptimal condition to use is shown in Table 4. TABLE 4 Fluorescencesignal detected using the PICOGREEN ® assay with various amounts of PCRproduct. The 24 PCR products are from samples representing a pool of 11DNA targets with a sizes up to 4.5 kb cloned into the vectorPCR4Blunt-TOPO. The transposon used was pGPS-Apra-2 uM-URA3. uL PCR 1 23 4 5 6 7 8 9 10 11 12 13 14 1.3 353 287 206 987 180 345 1734 351 171312 309 237 1102 308 2.5 839 827 495 1924 412 751 2941 694 397 710 711413 2129 541 5 1489 1503 835 2769 635 1450 3703 1640 813 1220 1447 8822991 1198 10 1806 2234 1775 3023 1354 2099 3870 2112 1378 2095 1909 17463401 1882 uL high- PCR 15 16 17 18 19 20 21 22 23 24 low 1.3 202 245 843134 126 318 223 308 635 223 1608 2.5 456 549 1713 313 217 634 407 6901434 552 2724 5 934 1003 2560 624 552 1259 1112 873 2341 1116 3151 101681 1574 2902 1197 983 1637 1642 1560 2967 1333 2887

[0070] The signal output generated by the fluorescence reader is thenexported to a histogram display, and a cutoff value for positive clonesis assigned from the histogram. Using the current conditions, the cutoffgenerally falls between 1000-1500.

[0071] A Genesis RSP 150 TECAN robotics station (TECAN U.S. INC., P.O.Box 13953, Research Triangle Park, N.C. 27709) was used to generate96-deep-well plates containing cells from the clones identified by thescreen as having the transposon in the target DNA sequence. Cells fromeach positive clone are placed in Superbroth and the appropriateantibiotics (dual selection) in one of the wells. These plates are grownand then the cultures are submitted for automated sequencing.

Example of the Optimized PCR Screen Technique for 384-Well Plates

[0072] Use Q-Bot to pick colonies of the transposon containing clonesinto Genetix 384-well shallow-well plates containing 65 μL per well ofSuperbroth plus appropriate antibiotic. Place plates into a humidified37° C. incubator for 19-22 hours. Aliquot 25 μL per well of sterile 50%glycerol into the growth plates using a Multidrop 384 115V liquiddispenser (Labsystems Inc., 8 East Forge Parkway, Franklin Mass. 02038).Using the glycerol plate as the template, set-up PCR using the followingrecipe:

[0073] ddH₂0: 17.875 μL

[0074] 10×Z-Taq buffer: 2.5 μL

[0075] dNTP mixture: 1.5 μL

[0076] template DNA*: 1 μL (using the Kerplunker, pin-tool)

[0077] Z-Taq DNA polymerase: 0.125 μL (0.3125 Units)

[0078] cM13f (5 pmol/μL): 1 μL

[0079] cM13r (5 pmol/μL): 1 μL

[0080] Placed the PCR reaction plate into a thermocycler and start thefollowing cycle: 1. denature: 98° C. for 5 seconds 2. anneal: 52° C. for10 seconds 3. extension: 72° C. for 30 seconds (7.7 seconds/kb) 4. go to1, 35 times 5. 4° C. until the PCR reaction is evaluated. Total cyclingtime: 1 hour, 18 minutes, 30 seconds.

[0081] While the reaction is underway, thaw the PICOGREEN® reagent atroom temperature.

[0082] Prepare a 1:150 dilution of PICOGREEN® in 1×TE Buffer. Prepare 10mL per 384-well plate. Use the Robbins Hydra to aliquot 25 μL per wellto an entire 384-well black plate. Use the Robbins Hydra to dispense 5μL per well of the PCR reaction into the PICOGREEN® plate. Immediatelyplace the plate into the fluorescence plate reader and mix by shakingfor 10 seconds. Export the fluorescence signal data to a floppy disk.Import data into an Excel macro and generate a rearray list of positiveclones using 1500 as the threshold. See FIG. 7 for an example of finalPICOGREEN® data obtained from a PCR screen. Open the rearray list in theTECAN robot and rearray clones from the glycerol stock plate into a96-well deep-well plate containing Superbroth and appropriateantibiotics. After the plates have been incubated, the cultures in themare sequenced in an automated process.

Example 3

[0083] Transposon Expedited Multiplex Sequencing of Unknown Sequences inVector PCR4Blunt-TOPO

[0084] Transposon Reaction

[0085] The transposon reaction was performed using the Genome PrimingSystem (GPS) kit from New England BioLabs. All of the componentsprovided in the kit were used with the exception of the transposon GPS1.The transposon utilized was a modified version of GPS1. Two variationswere generated. In one variant, pGPS-Apra, the existing resistancemarker has been replaced by the apramycin resistant gene. This wasperformed to ensure that the vector could be used to sequence any vectorregardless of resistance marker. In another variant, pGPS-Apra-Y2 um a“stuffer” region containing the yeast 2 um replication origin was clonedinto the transposon thus increasing its size to 5 kb (FIG. 6). Thisalteration was created in order to improve the efficiency of the processby screening out transpositions into the vector backbone. The reactionwas transformed by electroporation into DH10B E. coli competent cellsand plated on Luria-Bertani (1.0% bacto-tryptone, 0.5% bacto-yeastextract, 1.0% NaCl, pH 7.0) agar plates containing the appropriateantibiotics (aprarnycin (100 μg/ml)/kanamycin (50 μg/ml)) for selectionof clones harboring the transposon. The growth period is 20-22 hours.

[0086] TEMS (Transposon Expedited Multiplex Sequencing)

[0087] The size of all target DNA sequences subjected to TEMS wasevaluated by restriction digest of the target DNA-containing vectorswith NotI. Plasmid sets containing target DNA sequences of various sizescontained in the same vector backbone were pooled. Equimolarconcentrations were calculated for each target DNA sequence prior topooling. The transposon reaction was performed treating the pooled DNAtemplate as a single template. For six-fold coverage of each base ofeach target DNA sequence, the total size of each target DNA sequence wasmultiplied by our standard factor of 18 clones per 1 kb of sequence todetermine the number of randomly selected clones to sequence. Based onthe total size of the vector and the total size of the target DNAsequence, a probability of the location of the transposon insertion wascalculated. Multiplying this factor with the number of clones forsix-fold coverage determines how many clones to randomly select andsubject to the PCR screen.

[0088] The calculated number of colonies were selected and inoculatedinto Superbroth (bacto-tryptone, yeast extract, NaCl, NaOH) mediacontaining the appropriate antibiotics on 96 or 384 well culture plates.These culture plates were grown overnight without shaking for 18-22hours depending on the format of the growth plate (96/384 well).Glycerol stocks were made for temporary storage of the plate. 50%glycerol was added directly to the plate and mixed. This allowedindefinite storage in −80 C.

[0089] Screening out Vector Insertions by PCR

[0090] The primer sequence selected to amplify the vector portion of theclones for screening were the complement sequences of the universalprimers M13F(−20) and M13R. The sequence is as follows:5′-ACTGGCCGTCGTTTTAC-3′ and 5′-CATGGTCATAGCTGTT-3′. Culture PCR wasassembled with the following conditions: the template was deposited intothe PCR reaction reagent preparation using a pin replicator tool whichaliquoted approximately 1 μL. For the 96-well format, the 10×PCR buffercomposition was 100 mM Tris-HCl pH 8.3; 500 mM KCl; 15 mM MgCl2; 0.01%w/v gelatin. Five picomoles of the forward primer and five pmoles of thereverse primer were used in each reaction. The dNTP concentration in thereaction was 2.5 mM final concentration. The quantity of Taq polymeraseadded to each reaction was 0.1 units. All reactions were carried out ina total volume of 50 ul. The 384-well format PCR was done in a totalvolume of 25 μL using the TaKaRa Z-Taq kit obtained from PanveraCorporation, 545 Science Drive, Madison, Wis. 53711. This kit consistsof Z-taq polymerase (2.5 units/μL), 10×Z-Taq buffer containing a 30 mMconcentration of Mg2+, and a dNTP mixture (2.5 mM each). Thermocyclerswere run with a heated lid (MJ Research). This PCR cycle was designed toamplify only the vector without the transposon due to the shortelongation time. Black PCR plates were used so that the fluorescentanalysis could later be performed directly in the plates.

[0091] Analysis Conditions Using PICOGREEN®

[0092] Upon completion of the PCR, 25 ul of PICOGREEN® was added the PCRplate. A dilution of PICOGREEN® stock 1:200 in 1×TE (tris-acetate, EDTA)buffer at 50 ul volume was mixed with 10 ul of PCR reaction. The platewas read in a spectrofluorometer where the molecules are excited at 480nm and the signal is emitted at 540 nm. The PCR products fluorescingabove a predetermined threshold were the positive clones to besequenced. The threshold was experimentally determined using knownconstructs containing the transposon either in the vector or in aninsert to the vector. The positive clones were sorted and compressedinto a format suitable for array by the TECAN robotic arm. Glycerolstocks were inoculated into the 96 deep well plate containing 1.5 ml ofSuperbroth (bacto-tryptone, yeast extract, NaCl, 5N NaOH) andappropriate antibiotics. These cultures were grown for a 24 hour timeperiod and then the DNA targets from the cultures are sequenced usingthe primers at either end of the transposon.

[0093] Sequence Assembly

[0094] After vector trimming and base calling of the raw sequences usingPhred software, the sequences are assembled using Phrap software (seeBrent Ewing, LaDeana Hillier, Michael C. Wendl, and Phil Green.Base-calling of automated sequencer traces using phred. I. Accuracyassessment. 1998. Genome Research 8:175-185. and Brent Ewing and PhilGreen Base-calling of automated sequencer traces using phred. II. Errorprobabilities. 1998. Genome Research 8:186-194). The assembly isreviewed using consed software. (See Consed: A Graphical Tool forSequence Finishing Gordon et al 1998.) Criteria for completion are highquality, six-fold coverage of the entire full-length gene. The expectederror rate per 10 kb of sequence should be <0.1. If these criteria arenot met, primers are selected along with templates for primer walkingsequencing of the error-prone section of target DNA sequence. The primerwalking sequencing is done on either the ABI 310 sequencer or throughhigh-throughput sequencing using the ABI 3700 depending on volume ofsequencing needed, and those sequences are subsequently added to theassembly. BLAST analysis is done for comparison especially to knowngenes. (See: Altschul, S. F., Gish, W., Miller, W., Myers, E. W. &Lipman, D. J. (1990) “Basic local alignment search tool.” J. Mol. Biol.215:403-410; Gish, W. & States, D. J. (1993) “Identification of proteincoding regions by database similarity search.” Nature Genet. 3:266-272;Madden, T. L., Tatusov, R. L. & Zhang, J. (1996) “Applications ofnetwork BLAST server” Meth. Enzymol. 266:131-141; Altschul, S. F.,Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W. &Lipman, D. J. (1997) “Gapped BLAST and PSI-BLAST: a new generation ofprotein database search programs.” Nucleic Acids Res. 25:3389-3402; andZhang, J. & Madden, T. L. (1997) “PowerBLAST: A new network BLASTapplication for interactive or automated sequence analysis andannotation.” Genome Res. 7:649-656.) The generated full-length sequenceis imported into another assembly program called Sequencher. (See Cash,H., Clark, B., Galt, J., Garb, C., Goebel III, C. J., & Singer, J.Sequencher User Guide “The complete Software Solution for SequencingDNA”. Gene Codes Corporation. 1999.) With this program, it is easy toview the open reading frame to be sure mutations, if any, have notcaused a frame shift.

[0095] It is to be understood that the foregoing examples are exemplaryand explanatory only and are not restrictive of the invention. Variouschanges may be made to the embodiments described above by one of skillin the art without departing from the scope of the invention, as definedby the following claims.

1 4 1 2613 DNA Artificial GPS-Apra transposon vector 1 ggtaccctgtgaatgcgcaa accaaccctt ggcagaacat atccatcgcg tccgccatct 60 ccagcagccgcacgcggcgc atctcgggca gcgttgggtc ctggccacgg gtgcgcatga 120 tcgtgctcctgtcgttgagg acccggctag gctggcgggg ttgccttact ggttagcaga 180 atgaatcaccgatacgcgag cgaacgtgaa gcgactgctg ctgcaaaacg tctgcgacct 240 gagcaacaacatgaatggtc ttcggtttcc gtgtttcgta aagtctggaa acgcggaagt 300 cagcgccctgcaccattatg ttccggatct atgtcgggtg cggagaaaga ggtaatgaaa 360 tggcagatccctggcttgtt gtccacaacc gttaaacctt aaaagcttta aaagccttat 420 atattcttttttttcttata aaacttaaaa ccttagaggc tatttaagtt gctgatttat 480 attaattttattgttcaaac atgagagctt agtacgtgaa acatgagagc ttagtacgtt 540 agccatgagagcttagtacg ttagccatga gggtttagtt cgttaaacat gagagcttag 600 tacgttaaacatgagactta gtacgtgaaa catgagagct tagtacgtac tatcaacagg 660 ttgaactgctgatcttcgga tctatgtcgg gtgcggagaa agaggtaatg aaatggcatc 720 cggatctgcatcgcaggatg ctgctggcta ccctgtggaa cacctacatc tgtattaacg 780 aagcgctggcattgaccctg agtgattttt ctctggtccc gccgcatcca taccgccagt 840 tgtttaccctcacaacgttc cagtaaccgg gcatgttcat catcagtaac ccgtatcgtg 900 agcatcctctctcgtttcat cggtatcatt acccccatga acagaaatcc cccttacacg 960 gaggcatcagtgaccaaaca ggaaaaaacc gcccttaaca tggcccgctt tatcagaagc 1020 cagacattaacgcttctgga gaaactcaac gagctggacg cggatgaaca ggcagagctc 1080 ttactgtcatgccatccgta tgtgggcgga caataaagtc ttaaactgaa caaaatagat 1140 ctaaactatgacaataaagt cttaaactag acagaatagt tgtaaactga aatcagtcca 1200 gttatgctgtgaaaaagcat actggacttt tgttatggct aaagcaaact cttcattttc 1260 tgaagtgcaaattgcccgtc gtattaaaga ggggcgtggg gtcgacgcgg ccgcgccgta 1320 tttgcagtaccagcgtacgg cccacagaat gatgtcacgc tgaaaatgcc ggcctttgaa 1380 tgggttcatgtgcagctcca tcagcaaaag gggatgataa gtttatcacc accgactatt 1440 tgcaacagtgccgttgatcg tgctatgatc gactgatgtc atcagcggtg gagtgcaatg 1500 tcgtgcaatacgaatggcga aaagccgagc tcatcggtca gcttctcaac cttggggtta 1560 cccccggcggtgtgctgctg gtccacagct ccttccgtag cgtccggccc ctcgaagatg 1620 ggccacttggactgatcgag gccctgcgtg ctgcgctggg tccgggaggg acgctcgtca 1680 tgccctcgtggtcaggtctg gacgacgagc cgttcgatcc tgccacgtcg cccgttacac 1740 cggaccttggagttgtctct gacacattct ggcgcctgcc aaatgtaaag cgcagcgccc 1800 atccatttgcctttgcggca gcggggccac aggcagagca gatcatctct gatccattgc 1860 ccctgccacctcactcgcct gcaagcccgg tcgcccgtgt ccatgaactc gatgggcagg 1920 tacttctcctcggcgtggga cacgatgcca acacgacgct gcatcttgcc gagttgatgg 1980 caaaggttccctatggggtg ccgagacact gcaccattct tcaggatggc aagttggtac 2040 gcgtcgattatctcgagaat gaccactgct gtgagcgctt tgccttggcg gacaggtggc 2100 tcaaggagaagagccttcag aaggaaggtc cagtcggtca tgcctttgct cggttgatcc 2160 gctcccgcgacattgtggcg acagccctgg gtcaactggg ccgagatccg ttgatcttcc 2220 tgcatccgccagagggcggg atgcgaagaa tgcgatgccg ctcgccagtc gattggctga 2280 gctcatgagcggagaacgag atgacgttgg aggggcaagg tcgcgctgat tgctggggca 2340 acacgtggagcggatcgggg attgtctttc ttcagctcgc tgatgatatg ctgacgctca 2400 atgccgtttggactagtgtc gaccaaccag ataagtgaaa tctagttcca aactattttg 2460 tcatttttaattttcgtatt agcttacgac gctacaccca gttcccatct attttgtcac 2520 tcttccctaaataatcctta aaaactccat ttccacccct cccagttccc aactattttg 2580 tccgcccacaccgtaagatg cttttctgtg act 2613 2 6114 DNA Artificial GPS-Apra-2um-URA3transposon vector 2 ggtaccctgt gaatgcgcaa accaaccctt ggcagaacatatccatcgcg tccgccatct 60 ccagcagccg cacgcggcgc atctcgggca gcgttgggtcctggccacgg gtgcgcatga 120 tcgtgctcct gtcgttgagg acccggctag gctggcggggttgccttact ggttagcaga 180 atgaatcacc gatacgcgag cgaacgtgaa gcgactgctgctgcaaaacg tctgcgacct 240 gagcaacaac atgaatggtc ttcggtttcc gtgtttcgtaaagtctggaa acgcggaagt 300 cagcgccctg caccattatg ttccggatct atgtcgggtgcggagaaaga ggtaatgaaa 360 tggcagatcc ctggcttgtt gtccacaacc gttaaaccttaaaagcttta aaagccttat 420 atattctttt ttttcttata aaacttaaaa ccttagaggctatttaagtt gctgatttat 480 attaatttta ttgttcaaac atgagagctt agtacgtgaaacatgagagc ttagtacgtt 540 agccatgaga gcttagtacg ttagccatga gggtttagttcgttaaacat gagagcttag 600 tacgttaaac atgagagctt agtacgtgaa acatgagagcttagtacgta ctatcaacag 660 gttgaactgc tgatcttcgg atctatgtcg ggtgcggagaaagaggtaat gaaatggcat 720 ccggatctgc atcgcaggat gctgctggct accctgtggaacacctacat ctgtattaac 780 gaagcgctgg cattgaccct gagtgatttt tctctggtcccgccgcatcc ataccgccag 840 ttgtttaccc tcacaacgtt ccagtaaccg ggcatgttcatcatcagtaa cccgtatcgt 900 gagcatcctc tctcgtttca tcggtatcat tacccccatgaacagaaatc ccccttacac 960 ggaggcatca gtgaccaaac aggaaaaaac cgcccttaacatggcccgct ttatcagaag 1020 ccagacatta acgcttctgg agaaactcaa cgagctggacgcggatgaac aggcagagct 1080 cttactgtca tgccatccgt atgtgggcgg acaataaagtcttaaactga acaaaataga 1140 tctaaactat gacaataaag tcttaaacta gacagaatagttgtaaactg aaatcagtcc 1200 agttatgctg tgaaaaagca tactggactt ttgttatggctaaagcaaac tcttcatttt 1260 ctgaagtgca aattgcccgt cgtattaaag aggggcgtggggtcgacgcg gccgcgaatt 1320 ctgaaccagt cctaaaacga gtaaatagga ccggcaattcttcaagcaat aaacaggaat 1380 accaattatt aaaagataac ttagtcagat cgtacaataaagctttgaag aaaaatgcgc 1440 cttattcaat ctttgctata aaaaatggcc caaaatctcacattggaaga catttgatga 1500 cctcatttct ttcaatgaag ggcctaacgg agttgactaatgttgtggga aattggagcg 1560 ataagcgtgc ttctgccgtg gccaggacaa cgtatactcatcagataaca gcaatacctg 1620 atcactactt cgcactagtt tctcggtact atgcatatgatccaatatca aaggaaatga 1680 tagcattgaa ggatgagact aatccaattg aggagtggcagcatatagaa cagctaaagg 1740 gtagtgctga aggaagcata cgataccccg catggaatgggataatatca caggaggtac 1800 tagactacct ttcatcctac ataaatagac gcatataagtacgcatttaa gcataaacac 1860 gcactatgcc gttcttctca tgtatatata tatacaggcaacacgcagat ataggtgcga 1920 cgtgaacagt gagctgtatg tgcgcagctc gcgttgcattttcggaagcg ctcgttttcg 1980 gaaacgcttt gaagttccta ttccgaagtt cctattctctagaaagtata ggaacttcag 2040 agcgcttttg aaaaccaaaa gcgctctgaa gacgcactttcaaaaaacca aaaacgcacc 2100 ggactgtaac gagctactaa aatattgcga ataccgcttccacaaacatt gctcaaaagt 2160 atctctttgc tatatatctc tgtgctatat ccctatataacctacccatc cacctttcgc 2220 tccttgaact tgcatctaaa ctcgacctct acattttttatgtttatctc tagtattact 2280 ctttagacaa aaaaattgta gtaagaacta ttcatagagtgaatcgaaaa caatacgaaa 2340 atgtaaacat ttcctatacg tagtatatag agacaaaatagaagaaaccg ttcataattt 2400 tctgaccaat gaagaatcat caacgctatc actttctgttcacaaagtat gcgcaatcca 2460 catcggtata gaatataatc ggggatgcct ttatcttgaaaaaatgcacc cgcagcttcg 2520 ctagtaatca gtaaacgcgg gaagtggagt caggctttttttatggaaga gaaaatagac 2580 accaaagtag ccttcttcta accttaacgg acctacagtgcaaaaagtta tcaagagact 2640 gcattataga gcgcacaaag gagaaaaaaa gtaatctaagatgctttgtt agaaaaatag 2700 cgctctcggg atgcattttt gtagaacaaa aaagaagtatagattctttg ttggtaaaat 2760 agcgctctcg cgttgcattt ctgttctgta aaaatgcagctcagattctt tgtttgaaaa 2820 attagcgctc tcgcgttgca tttttgtttt acaaaaatgaagcacagatt cttcgttggt 2880 aaaatagcgc tttcgcgttg catttctgtt ctgtaaaaatgcagctcaga ttctttgttt 2940 gaaaaattag cgctctcgcg ttgcattttt gttctacaaaatgaagcaca gatgcttcgt 3000 taacaaagat atgctattga agtgcaagat ggaaacgcagaaaatgaacc ggggatgcga 3060 cgtgcaagat tacctatgca atagatgcaa tagtttctccaggaaccgaa atacatacat 3120 tgtcttccgt aaagcgctag actatatatt attatacaggttcaaatata ctatctgttt 3180 cagggaaaac tcccaggttc ggatgttcaa aattcaatgatgggtaacaa gtacgatcgt 3240 aaatctgtaa aacagtttgt cggatattag gctgtatctcctcaaagcgt attcgaatat 3300 cattgagaag ctgcagcgtc acatcggata ataatgatggcagccattgt agaagtgcct 3360 tttgcatttc tagtctcttt ctcggtctag ctagttttactacatcgcga agatagaatc 3420 ttagatcaca ctgcctttgc tgagctggat caatagagtaacaaaagagt ggtaaggcct 3480 cgttaaagga caaggacctg agcggaagtg tatcgtacagtagacggagt atactagtat 3540 agtctatagt ccgtggaatt ctcatgtttg acagcttatcatcgataagc ttttcaattc 3600 aattcatcat ttttttttta ttcttttttt tgatttcggtttctttgaaa tttttttgat 3660 tcggtaatct ccgaacagaa ggaagaacga aggaaggagcacagacttag attggtatat 3720 atacgcatat gtagtgttga agaaacatga aattgcccagtattckyrrc cgcwwytgca 3780 cagaacaaaa acctgcagga aacgaagata aatcatgtcgaaagctacat ataaggaacg 3840 tgctgctact catcctagtc ctgttgctgc caagctatttaatatcatgc acgaaaagca 3900 aacaaacttg tgtgcttcat tggatgttcg taccaccaaggaattactgg agttagttga 3960 agcattaggt cccaaaattt gtttactaaa aacacatgtggatatcttga ctgatttttc 4020 catggagggc acagttaagc cgctaaaggc attatccgccaagtacaatt ttttactctt 4080 cgaagacaga aaatttgctg acattggtaa tacagtcaaattgcagtact ctgcgggtgt 4140 atacagaata gcagaatggg cagacattac gaatgcacacggtgtggtgg gcccaggtat 4200 tgttagcggt ttgaagcagg cggcagaaga agtaacaaaggaacctagag gccttttgat 4260 gttagcagaa ttgtcatgca agggctccct atctactggagaatatacta agggtactgt 4320 tgacattgcg aagagcgaca aagattttgt tatcggctttattgctcaaa gagacatggg 4380 tggaagagat gaaggttacg attggttgat tatgacacccggtgtgggtt tagatgacaa 4440 gggagacgca ttgggtcaac agtatagaac cgtggatgatgtggtctcta caggatctga 4500 cattattatt gttggaagag gactatttgc aaagggaagggatgctaagg tagagggtga 4560 acgttacaga aaagcaggct gggaagcata tttgagaagatgcggccagc aaaactaaaa 4620 aactgtatta taagtaaatg catgtatact aaactcacaaattagagctt caatttaatt 4680 atatcagtta ttacccggga atctcggtcg taatgatttttataatgacg aaaaaaaaaa 4740 aattggaaag aaaaagcttt aatgcggtag tttatcacagttaaattgct aacgcagtca 4800 ggcaccggcg gccgcgccgt atttgcagta ccagcgtacggcccacagaa tgatgtcacg 4860 ctgaaaatgc cggcctttga atgggttcat gtgcagctccatcagcaaaa ggggatgata 4920 agtttatcac caccgactat ttgcaacagt gccgttgatcgtgctatgat cgactgatgt 4980 catcagcggt ggagtgcaat gtcgtgcaat acgaatggcgaaaagccgag ctcatcggtc 5040 agcttctcaa ccttggggtt acccccggcg gtgtgctgctggtccacagc tccttccgta 5100 gcgtccggcc cctcgaagat gggccacttg gactgatcgaggccctgcgt gctgcgctgg 5160 gtccgggagg gacgctcgtc atgccctcgt ggtcaggtctggacgacgag ccgttcgatc 5220 ctgccacgtc gcccgttaca ccggaccttg gagttgtctctgacacattc tggcgcctgc 5280 caaatgtaaa gcgcagcgcc catccatttg cctttgcggcagcggggcca caggcagagc 5340 agatcatctc tgatccattg cccctgccac ctcactcgcctgcaagcccg gtcgcccgtg 5400 tccatgaact cgatgggcag gtacttctcc tcggcgtgggacacgatgcc aacacgacgc 5460 tgcatcttgc cgagttgatg gcaaaggttc cctatggggtgccgagacac tgcaccattc 5520 ttcaggatgg caagttggta cgcgtcgatt atctcgagaatgaccactgc tgtgagcgct 5580 ttgccttggc ggacaggtgg ctcaaggaga agagccttcagaaggaaggt ccagtcggtc 5640 atgcctttgc tcggttgatc cgctcccgcg acattgtggcgacagccctg ggtcaactgg 5700 gccgagatcc gttgatcttc ctgcatccgc cagagggcgggatgcgaaga atgcgatgcc 5760 gctcgccagt cgattggctg agctcatgag cggagaacgagatgacgttg gaggggcaag 5820 gtcgcgctga ttgctggggc aacacgtgga gcggatcggggattgtcttt cttcagctcg 5880 ctgatgatat gctgacgctc aatgccgttt ggactagtgtcgaccaacca gataagtgaa 5940 atctagttcc aaactatttt gtcattttta attttcgtattagcttacga cgctacaccc 6000 agttcccatc tattttgtca ctcttcccta aataatccttaaaaactcca tttccacccc 6060 tcccagttcc caactatttt gtccgcccac accgtaagatgcttttctgt gact 6114 3 16 DNA Artificial primer 3 actggccgtc gtttta 16 416 DNA Artificial primer 4 catggtcata gctgtt 16

1. A process for parallel, transposon-mediated sequencing of DNA, the process comprising: i) providing one or more target DNA sequences, each inserted into a vector at an insertion site; ii) amplifying each target DNA-containing vector and pooling them, wherein each target DNA sequence is represented in the pool in an substantially equal amount per kb; iii) exposing the pool of target DNA-containing vectors to a selectable transposon, wherein the selectable transposon integrates into the target DNA-containing vectors at substantially random sites to form a pool of target DNA- and transposon-containing vectors; iv) transforming cells with the pool of target DNA- and transposon-containing vectors and isolating and growing a representative number of individual transformants into cultures under selection conditions; v) performing a polymerase chain reaction on DNA from each culture, wherein the polymerase chain reaction uses a pair of primers complementary to the 3′ ends of the vector sequence at the insertion site and has an extension time during each reaction cycle sufficient to efficiently produce a full-length copy of the vector sequence but too short to efficiently produce a full-length copy of the vector sequence with the transposon inserted; vi) measuring the amount of DNA produced in each polymerase chain reaction; and vii) sequencing the transposon flanking regions of the target DNA-containing vectors from those cultures corresponding to polymerase chain reactions which produced substantial amounts of DNA.
 2. The process of claim 1, wherein one or more of steps i) to vii) are automated.
 3. The process of claim 1, wherein the step of measuring the amount of DNA produced in each polymerase chain reaction comprises adding a fluorescent stain selective for dsDNA to the finished polymerase chain reaction and measuring the resulting amount of fluorescence.
 4. The process of claim 3, wherein one or more of steps i) to vii) are automated.
 5. The process of claim 3, wherein the fluorescent stain selective for dsDNA is selected from the group consisting of PICOGREEN® and bisbenzimide dyes.
 6. The process of claim 3, wherein the fluorescent stain selective for dsDNA is PICOGREEN®.
 7. The process of claim 1, wherein the vector and transposon carry different selectable markers.
 8. The process of claim 1, wherein the transposon is at least 3 kb.
 9. The process of claim 8, wherein the transposon is at least 4 kb.
 10. The process of claim 9, wherein the transposon is GPS-Apra-2 um-URA3.
 11. The process of claim 8, wherein the transposon is at least 5 kb.
 12. The process of claim 1, wherein the transposon is equal to or greater than the vector in length.
 13. The process of claim 12, wherein the transposon is at least about {fraction (5/4)} the length of the vector.
 14. The process of claim 13, wherein the transposon is about 5 kb and the vector is about 4 kb.
 15. The process of claim 12, wherein the transposon is at least about {fraction (5/3)} the length of the vector.
 16. The process of claim 15, wherein the transposon is about 5 kb and the vector is about 3 kb. 